-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmark: matmul ukernel vs direct codegen #415
base: main
Are you sure you want to change the base?
Conversation
|
||
// We will time the run, and print it. | ||
auto start = std::chrono::high_resolution_clock::now(); | ||
auto run = kernel(bo_instr, instr_v.size(), bo_a, bo_b, bo_c); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was this kernel generated with latest iree-amd-aie, if the answer is yes then this would be the wrong way to run it. You can copy the changes done in the tests by Xilinx/mlir-aie#1517
9bc089d
to
ef6fb1a
Compare
0f6d244
to
7b7b593
Compare
BASE_COMPILATION_FLAGS="-iree-hal-target-backends=amd-aie \ | ||
-iree-amd-aie-peano-install-dir=${PEANO} \ | ||
-iree-amd-aie-mlir-aie-install-dir=${MLIR_AIE_INSTALL} \ | ||
-iree-amd-aie-vitis-install-dir=${VITIS} \ | ||
-iree-amd-aie-show-invoked-commands" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is missing the path to IREE:
-iree-amd-aie-install-dir=$IREE_INSTALL_DIR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok thanks. This is a recent flag addition afaik. Maybe I should ask/push for this PR to be landed so it doesn't go stale..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, very recent (it just started failing for me). Having this test in CI wouldn't be entirely out of the question, I don't think. It's a good reference for people trying to do some performance analysis (like myself).
End to end script. Running locally (nuc50)
Direct codegen
Using ukernel
So the ukernel approach is currently 3x faster. This is a lower bound though (i.e. core ukernel probably more than 3x faster). Consider:
Where
other-time
is the same in the 2 experiments, as only the instruction memory is different (identical DMA data movement). We observed thatso that
as
other-time
is the same in both approaches (date movement between DDR <-> memtile <-> core is identical)I think on this phoenix machine, theoretical max is
4 tops/second
. So ukernel approach is 50% of theoretical max.Two extremes:
So performance of ukernel is between 3x and 5x better than dcg.